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Abstract 



In this paper we derive an updating scheme for calculating some important network statistics 
such as degree, clustering coefficient, etc., aiming at reduce the amount of computation needed to 
track the evolving behavior of large networks; and more importantly, to provide efficient methods 
for potential use of modeling the evolution of networks. Using the updating scheme, the network 
statistics can be computed and updated easily and much faster than re-calculating each time for 
large evolving networks. The update formula can also be used to determine which edge/node 
will lead to the extremal change of network statistics, providing a way of predicting or designing 
evolution rule of networks. 
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I. INTRODUCTION 



Complex networks are useful tools for modeling complicated real life objects and their 
interactions. Examples include computer networks, social networks, biological networks, 
etc. [3] [1] [9] [7] [8] [TBJ . Different from traditional graph theory approach which emphasize 
on micro-state quantity of each node in the network, recently developed statistical methods 
[9] allow us to analyze large networks by summarizing several important statistics out of a 
massive amount of information carried by the network itself. These statistics include degree 
(number of connections each node has), clustering coefficient [3], assortativity coefficient 
[TU] . modularity measure [12], etc. Fast algorithms have been developed to compute 

these statistics for any given network, either represented by adjacency matrix or edge list 

However, for any evolving network, to measure the corresponding evolution of network 
statistics, the computation based on static network structure has to be done for the network 
at each time step, resulting in an impractical task even each single computation is fast. A 
missing part in the study of evolving network is a development of a dynamic algorithm which 
updates, rather than re-compute the statistics. 

In this paper we present a dynamical algorithm based on the knowledge of existing 
network structure and the changes to the network. We will consider adjacency matrix 
as the default structure of representing a network [16j. The results hold very similarly if one 
uses edge-list instead. 

The rest of the paper is organized as follows. In Section II we review the definition of some 
network statistics and introduce notation that will be used in the paper for these statistics. 
In Section III we derive update formula for network statistics upon the change of network 
structure and compare the computational complexity to the use of regular methods. In 
Section IV we show examples of application using updating scheme. In Section V we discuss 
the main results of the paper and give some overview of potential future research. 

II. DEFINITION AND NOTATION 

A mathematical representation of a network is a graph G = (V, E) where V = {1, 2, A^} 
is the vertex set and E = \ i and j are connected} is the edge set. Note that for 
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undirected graphs, if G E then so is [IT]. In this work, we hmit ourselves 

to undirected, unweighted networks; their graphs possess a symmetric, binary adjacency 
matrix A: 

0, otherwise. 
With M as the total number of edges in G, then 



(1) 



M 



\E\ 



\A\ 



|2 

If 



(2) 



Here |.| is the cardinality of a set. 

Define the neighborhood N{i) of node i as the set of vertices that are adjacent to i, i.e.: 



Nii)^m,j)eE} = {j\a., = l}. 
Likewise, define the shared neighborhood Nij of nodes i and j as: 



N{i)nNij). 



The degree ki of node i is the number of nodes it connects to: 

ki = \N{i) \ = "^ttij = ^Oji, 

j j 

since we limit ourselves to undirected networks. 

The clustering coefficient of node i is defined by [3]: 



(3) 



(4) 



(5) 



C,,= 



if k- > 9- 

ft'* ^ ^1 

0, otherwise. 



(6) 



where A, is the number of triangles that contain i. Then the average clustering coefficient 
[18] of the whole network is simply the average of all Cj's: 



C 



(7) 



Another interesting quantity is the assortativity coefficient r [TD] of a network 



r = 



8Mm - 



where 

M = ^ kikj, (9) 
V ^ Yl (^^ + ^^)' (10) 

(*,i)6£; 

Modularity Q is a quantity which measures the quahty of a community partition, 
typically defined as: 

(12) 
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2M 

where S{gi,gj) = 1 if nodes i and j are in the same group and zero otherwise, and 

SA = Y^ij^^3i,9j), Sp = Y^i^j^i9i,9j)- (13) 

III. UPDATING SCHEMES FOR STATISTICS UPON LOCAL INFORMATION 
A. Adding an edge between existing nodes 

Suppose apq = (p 7^ g and p, q are not connected), we analyze the impact of connecting 
p and q on the various statistics of the network. The goal is to derive computations that are 
as inexpensive as possible. We use ~ to represent updated statistics: 

E = EU{{p,q),{q,p)}, (14) 
M = M + A+M = M + 1, (15) 

and 

where we use update delta to represent the change for statistics upon adding an edge to 
the existing network, and in the following A~ will be used to denote change for statistics 
upon deleting an existing edge. We will not explicitly specify which edge to add or delete 
in the update delta notation when there is no confusion. 



Based on the above formulas, we can derive schemes for efficiently updating network 
statistics. 
Degree 

The change in degree for node i is simply: 



— ki -\- ^~^ki — ki -\- 6ir, + S, 



where 



A~'~/Cj — 5ip + 5iq. 



(17) 



(18) 



The above formula indicates that the degree changes only for vertex p and g, so that if one 
keeps a list of the degree of all vertices of the network, each update takes only 2 operations 
when a new edge is added. 
Clustering CoefRcient 

To compute the new clustering coefficient of each node, and thus the whole network, we 
need the updated number of triangles at node i: 



Ai, if i ^ {p, q} U Npq] 

^i=lA, + l, if I e Npg- 
^Ai + \Npg\, iiie{p,q}. 

Combining this with Eq. (17) and Aj = ^Ciki(ki — 1), from Eq. ([6]), we have: 



(19) 



a 



if i ^ {p, q} U Nj 



ki-l 



mp<,\ 



if i e Npg] 
if i G {p,q}. 



(20) 



Note that whenever the denominator of a fraction is zero, we define the fraction to be zero. 



in Eq. (20) and throughout. This maintains the consistency that Ci = ii ki < 2. Finally, 



the average clustering coefficient C becomes: 



C = C + A+C = C + 



N 



S k{ki - 1) ^ ^ 



ieNr, 



where 



A+C 



N 



ki{ki 1) 



i&{p,q} 



«G{p,(j} 



\Np 



I AT, 



ki{ki + 1) ki + l 



El \^^pq 
\ kiih + 1) ki + l 



(21) 



(22) 
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Note that to update the average clustering coefficient, we need to keep the clustering 
coefficient for each node in order to apply the update formula, which implies an 0{N) 
storage complexity. 

Assortativity CoefRcient 

To compute r, we need u, v, and w. The update formula for u is: 
It = ^ kikj = ^ fcifcj + 2{kp + l){kq + 1) 

= Yl hkj + 2 h{kp + l) 

+ 2 k{kg + l) + 2{kp+l){k, + l) 

ieNiq) 

( \ 

= + 2 Y Y h \ +'i{kp^l){kq^ 1) 

\ieiV(p) ieN{q) j 

= It + A+ii. (23) 

Here E — E\{{j)^q)^ is the edge set that contains all edges in E but (p, q) and 

and 

A+M = 2[ X] fci +2(A;p + l)(A;5 + l). (24) 

Similarly, we can obtain update formula for v and w: 

= t; + 4(A;p + /c^ + 1) = t; + A+i;, (25) 



where 



For w we have: 



where 



A+t; = 4(A;p + A;g + l). (26) 

= w + A+w, (27) 
A+^/; = 6 %{kp + 1) + A;,(A;q + 1)] + 4. (28) 
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Finally, the new assortativity coefficient can be updated using: 



r = r + A r 



AMw - 



8 (M + 1) (m + A+m) - (f + A+f ) 



(29) 



4 (M + 1) (u; + A+w) - (t; + A+i;)^ 
Modularity 

For modularity, we assume that after connecting the nodes p and g, the partitions Qi do 
not change for any node i. Then the new modularity measure will be: 



Q 



2M 



1 



2M 



(30) 



We already have M = M + 1, we now derive updating formulas for Sa and Sp. By Eq. (13) 
we have: 

Sa = Sa + A+S'a 



(flij + SipSjg + 5ig5jp)S{gi, gj) 



Sa + 2S{gp,gg) 



(31) 



where A+S*^ = 26{gp, i 
and 



= ^kikj6{gi,gj) 



(fcj + 5ip + 5ig) (fcj + 5jp + 5jg)5{gi, gj) 
Sp + 2^ki[6 {gi, gp) + 5{gi, gg)] +2[5 {gp, gg) + l] . 



(32) 



However, computing the sum in Eq. (32) for every update is expensive. To avoid this, 
define the following auxiliary statistics: 



^9 = ^ki6{gi 



(33) 



with updating scheme 



= Kg + 6{gp,g) + 6{gq,g) 



giving 



Sp = Sp + A+Sp = Sp + 2 {Kg^ + Kg^) +2[6{gp,gg) + l] 
where A+Sp = 2 {Kg^ + Kg^)+2 [6{gp, g,) + l] . 



(34) 



(35) 



Finally, combining (|31|) and (|35|) with (|30|) gives the updating scheme for Q: 

Q 



Q + A+g 
1 

2(M + 1) 



Sa + 26{gp,gg) 



2(M + 1) 



Sp + 2[Kg^ + Kg^] + 2[Sigp,g-q) + l] 



(36) 



From Eq. (36) one is able to predict whether the modularity measure Q increases or 



decreases with the knowledge of existing partition of the graph as well as the edge to be 
added. For example, if there is a preexisting partition of the graph into two groups, then 
if a new edge is added in between the two groups, then A^Q < 0, i.e., the modularity is 
to decrease. On the other hand, if a new edge is added to vertices belonging to the same 
group, then the modularity increases if the edge is added to the group with smaller total 
degree; However, adding an edge within a group does not necessarily increase Q if the edge 
is added into a group with larger total degree, see Fig. [T] as an example. 



B. Connecting a New Node 

The operation of adding an edge to a new node can be decomposed into two successive 
operations: first, introduce an isolated node that connects to nothing in the network; then 
add an edge between this node a previously existing node. We can use the previous results 
for the second step and need only focus on the first, i.e. adding an empty node to a network. 

Since no new edge is introduced, it's easy to obtain the following updating relations: 

N = N + 1, M = M, E = E, (37) 

and 

{aij, Hi N + 1 and j 7^ + 1; 
(38) 
0, otherwise. 
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FIG. 1: (Color online) An example that the modularity actually decreases when a new edge is 
added to vertices within the same group. The dashed oval boxes indicate the preexisting partition 
of the graph into two groups. Solid lines are the edges in the original graph. Before adding the new 
edge (dashes arrow line), the modularity is 0.125. After a new edge is added between two vertices 
in the same group (solid circles) the updated modularity becomes 0.1235. 

Then for other statistics, we have: 

kN+i = 0; (39) 

and 

Q = a, + 
Cn+i = 0, (40) 



so that 



c = lyc, = ^_ yc, = -^c. (41) 



N + l^ N + l 
Similarly, r = r since u = u, v = v, and w = w; and Q = Q since F = F, and H = H. 



C. Deleting an Existing Edge 

Now we investigate how network statistics changes when we delete an existing edge in 
the network. Suppose cipq = 1 {p 7^ q and p,q are connected), and we delete this edge, 



ip,q)^{q,p), from our edge set E. Using A to represent the updated adjacency matrix, and 
similarly for other statistics. Then we immediately have: 



E^E\{{p,q),{q,p)}, 



and 



Degree 

The change in degree for node i is: 



(42) 
(43) 

(44) 



where 



— ki -\- j\ ki — /Cj 5iq 



A~k- — — ^ 



(45) 



(46) 



Clustering Coefficient 

For the new clustering coefficient, we first obtain the formula for updating the number 
of triangles containing node i: 



Ai, if i ^ {p, q} U Npg] 

Ai-1, if i e Npg] 

^Ai-\Npg\, iiie{p,q}. 
Then we obtain the formula for updating Cf. 

f 

Cj, if i ^ UATpq; 

The average clustering coefficient C is updated by: 

C = C + A-C 



(47) 



(48) 



= C 



N 



\N I C 
(A;,-l)(A;i-2) ~ A;,-2 



(49) 
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where 



N 



XI k-.dc. - 1^ ^ 



kAki-l) ^ \(ki-l)(ki-2) 



v<i\ 



h-2 



Assortativity Coefficient 

The updating formulas for u, y, w are: 



u = u 



- 2 I ^ A;, + J] A;, I - 2(A;p - l){kg - 1), 

i&N{p) ieN{q) 



V = V - A{kp + kq — 1), 

w — w — 6 [kp{kp — 1) + kg{kq — 1)] — 4. 



Let 



(50) 



(51) 



A-ii = -2 ^ A;, + J] A;J - 2{kp - l){kq - 1), 

\i&N{p) i&N{q) J 

A'v = -4 {kp + kq-l), (52) 
A-w = -6 [kp{kp - 1) + A;g(A;q - 1)] - 4. 

Then we have: 

u — u + A~u, v — v + A~v, w — w + A~w, (53) 

and the new assortativity coefficient r is given by: 

_ _ SMu -v'^ _ 8(M - 1) (m + A-u) -{v + A'vf 

^~ AMw-v^ ~ A{M -l){w + A-w)-{v + A-vf ^ ' 

Modularity 

For modularity, we again assume that the community partitions gi are unchanged after 
disconnecting the edge between p and q. It follows that 



Sa^ Sa + ^ Sa^ SA-2S{gp,gq), , 

Sp^Sp + A-Sp^Sp-2{Kg^ + Kq^)+2[S{gp,gq) + l] 
where Kg is now updated using: 

Kg^Kg + A- Kg ^Kg- 6 {g p , g) - S{gq, g). 

These now define the updating scheme for Q — (Sa - Sp/2M^ /2M. 



(55) 
(56) 

(57) 
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D. On Computational Complexity 



In Table. |T| we compare the computational complexity of using updating scheme (that 
depends on existing knowledge of statistics) and regular methods. Note that for regular 
methods the operations count depends on the data structure used to represent the network, 
and will be different in general. The updating scheme requires 0(1) operations to update for 
sparse graphs and at most 0{< k >), which has significant advantage comparing to regular 
method if graph size becomes large. 



TABLE I: Comparison of Computational Complexity 



Statistics 


Updating Scheme Adjacency Matrix 


Edge List 


degree (one node) 


0(1) 


0{N) 


0{< k >) 


degree (network) 


0(1) 


0{N^) 


0{< k> N) 


clustering coefficient (one node) 


0(l)/0(< k >) 


0{< k> N) 


0{< k >3) 


clustering coefficient (network) 


0{< k >) 


0{< k > Af2) 


0(< k >3 N) 


assortativity coefficient 


0{< k >) 


0(A^2) 


0{< k> N) 


modularity measure 


0(1) 


0{N^) 


0{< k> N) 



Our primary focus is developing efficient algorithms for application to problems of dy- 
namic networks, and the computation savings is significant. However, one may also consider 
the process of building a network, which can be viewed simply as an edge-adding algorithm 
from a starting set of a graph with nodes and no edges. Then it takes ^^^^ steps to 
create the network. So the formulas for degree and modularity indicate that computing the 
entire time sequence of statistics has the same computational complexity as doing the single 
computation for the final state (using the edge list). The formula for clustering coefficient 
is more efficient to calculate each value along the way rather that the single computation 
of the final state, although we also need to take the operations of building the network into 
account and (possibly) extra storage. The time vector of assortativity coefficients requires 
an additional factor < k >, which is a minor price. 
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IV. EXAMPLES OF APPLICATION 



In this subsection we show implementation of the above formula to obtain the evolution 
of some network statistics. We will focus on the case of adding edges between existing 
nodes, the other two operations will be very similear. The statistics we will calculate are the 
degree distribution, average clustering coefficient and modularity measure, although again, 
the evolution of other statistics can be obtained in the same manner by using the updating 
scheme. The evolving network models we choose are not intented to mimic real-world nets, 
but to show the efficiency of the updating scheme. 

A. Evolution of Degree and Clustering Coefficient 

We implement the updating scheme to track the evolution of degree distribution and 
average clustering coefficient of a growing random graph [5] . 

The growing graph is obtained as following: start with a random graph of fixed size 

= 1000 with average degree < k >= 10, then at each time step, randomly choose two 
nodes that are not connected, and make an edge between them, until the average degree of 
the network reaches < k > = 20. 

Fig. [2] and Fig. |3] shows the evolution of a typical realization of the above growing model. 
The total number of time steps is 5000, which is 0{N) in this case. Note that using the 
updating scheme to obtain the evolution of degree in this case requires 0{N'^) (mostly for 
initial calculation) operations while using regular method would require 0{N^) operations 
(using adjacency matrix); for average clustering coefficient the updating scheme requires 
0{< k > N"^) operations and direct computation would require 0{< k > N^) operations 
(also for adjacency matrix format). The above comparision holds very similarly for using 
edge list representation. 

B. Evolution of Modularity 

We artificially create an initial network with clear partition. The initial network is con- 
structed as follows: generate an empty graph of vertices, prescribe a partition of the set 
{1, 2, A^} into two groups such that the group sizes are Ni,N2 (such that Ni + N2 = N) 
and probability Pi,P2,Pbetween- Randomly connect any pair of vertices in group 1 with prob- 
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FIG. 2: (Color online) Evolution of degree distribution of a random growing network. The number 
of vertices is 5000 in the network. Initially the connection probability of any pair of edge is 0.01, by 
adding random edges in the network, this probability increases to 0.02 in the end. We show three 
views of the evolution of the degree distribution as with respect to the process of add successive 
random edges. In the left and middle panel we see that for given time, the distribution is mimics 
a Possion distribution, and the peak is moving to larger degree side as time increases; while in the 
right panel we give a top view of the evolution. 



0.022 




OOB I 1 1 1 1 1 

1000 2000 3000 4000 5000 

time step 

FIG. 3: (Color online) Evolution of the average clustering coefficient C of a random growing 
network (described in Fig. [2]). Blue curve is the actual evolution of C, and red dashed line is the 
theoretical result given by Ct = where < k > is the average degree at that time instant. 

ability pi, and those in group 2 with probability p2] then randomly connect a vertice in 
group 1 to a vertice in group 2 with probability ptetween- Pbetween is usually chosen to be 
smaller than pi and p2 so that the community structure is clear. 

In our example, we choose N = 1000, group 1 to be the set of nodes {1, 2, 500} and the 
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rest group 2, so that Ni = N2 = 500. Also we let pi = P2 = 0.2 and ptetween = 0.05. Then we 
add random edges between the groups until the probability of connecting in between groups 
are the same as the probability of connecting inside the groups (resulting in a completely 
random network in the end). Fig. |4| Fig. [s] and Fig. [6] shows the modularity affected by this 
process. 




FIG. 4: Spy plot at three specific instances for the adjacency matrix. The left panel correponds to 
the initial network (pi = P2 = 0.2 and Pbetween = 0.05), where there is a clear community structure. 
The middle panel corresponds to the time when ptetween reaches 0.1 where the community structure 
becomes less apparent. The right panel is the end of the growing process such that Pbetween = 0.2 
and the network is totally random with no community structure. 





200 4D0 600 




200 4D0 600 



200 4D0 600 

permuted i 



FIG. 5: Components of the Fiedler vector [11 at three specific time instances (see Fig. |4]). In the 
three lower panels we plot the corresponding sorted components of the Fiedler vector. 
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0.2 



FIG. 6: (Color online) The evolution of modularity Q. Three red circles correspond to the time 
instances that are shown in Fig. [4] and Fig. [5j 

V. DISCUSSION AND CONCLUSION 

In this paper we derive update formula for important network statistics (degree, clustering 
coefficient, assortativity coefficient, modularity), to provide theoretical tools for analyzing 
evolution of large evolving networks. The update formula are based on singe edge or node 
updating, while in general any updating of the graph can be decomposed into these basic 
one edge (node) operations and update using the formula we present in this paper. We also 
present several examples to illustrate the use of updating scheme, it is the use of update 
formula that allows us to efficiently track the evolution of network statistics, while traditional 
methods will require much more operations and become impratical. 

The derivation of update formula requires that the statistics depend locally on network 
structure, for example, the update formula for clustering coefficient only requires the knowl- 
edge of local information of the vertices that are going to be connected. It becomes very 
hard, or maybe even impossible to derive exact update formula for statistics that depend on 
global information of the whole network, for example, the diameter, or the Fiedler vector of 
the network. However, the change of some of these global statistics can be bounded if there 
is only small change in the graph. For example, the change in the spectra and eigenvectors 
(including the Fiedler vector) of the graph Laplacian upon adding or deleting a few edges in 
the graph may be bounded by well-known perturbation results such as those in pQ and [2]. 
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